[opt](rowset) Aggregate non-MOW segment key bounds to reduce rowset meta size by liaoxin01 · Pull Request #62604 · apache/doris

liaoxin01 · 2026-04-19T13:58:06Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
For non-MOW (duplicate / aggregate key) tables, per-segment key bounds are not consumed on the read path — only the rowset-level [min, max] is used by the reader and ordered compaction. In cloud mode, persisting bounds for every segment can blow past FDB's value size limit on commit_rowset.

Introduce an enable_aggregate_non_mow_key_bounds BE config (default off). When enabled, non-MOW rowsets collapse per-segment bounds into a single [overall_min, overall_max] entry at write time, and compaction preserves this behavior. MOW rowsets always retain per-segment bounds — their lookup_row_key path relies on them for delete bitmap computation, and is guarded by a new DCHECK against aggregated input.

A new optional segments_key_bounds_aggregated flag is added to both RowsetMetaPB and RowsetMetaCloudPB so consumers can distinguish aggregated from per-segment layouts. Proto round-trip, pb_convert, snapshot restore, and index builder all preserve both this flag and the existing segments_key_bounds_truncated flag.

Correctness notes:

first_key/last_key callers (block_reader, ordered compaction) already bail out on overlapping rowsets, so for non-overlapping rowsets the aggregated [min, max] equals seg[0].min / seg[last].max exactly.
merge_rowset_meta (MOW partial-update publish) DCHECKs both sides are non-aggregated.

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…eta size For non-MOW (duplicate / aggregate key) tables, per-segment key bounds are not consumed on the read path — only the rowset-level [min, max] is used by the reader and ordered compaction. In cloud mode, persisting bounds for every segment can blow past FDB's value size limit on commit_rowset. Introduce an `enable_aggregate_non_mow_key_bounds` BE config (default off). When enabled, non-MOW rowsets collapse per-segment bounds into a single [overall_min, overall_max] entry at write time, and compaction preserves this behavior. MOW rowsets always retain per-segment bounds — their `lookup_row_key` path relies on them for delete bitmap computation, and is guarded by a new DCHECK against aggregated input. A new optional `segments_key_bounds_aggregated` flag is added to both RowsetMetaPB and RowsetMetaCloudPB so consumers can distinguish aggregated from per-segment layouts. Proto round-trip, pb_convert, snapshot restore, and index builder all preserve both this flag and the existing `segments_key_bounds_truncated` flag. Correctness notes: - `first_key/last_key` callers (`block_reader`, ordered compaction) already bail out on overlapping rowsets, so for non-overlapping rowsets the aggregated [min, max] equals seg[0].min / seg[last].max exactly. - `merge_rowset_meta` (MOW partial-update publish) DCHECKs both sides are non-aggregated.

hello-stephen · 2026-04-19T13:58:13Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

liaoxin01 · 2026-04-19T13:58:39Z

/review

Copilot

Pull request overview

This PR introduces an optional optimization to reduce rowset metadata size for non-MOW (duplicate/aggregate-key) tables by aggregating per-segment key bounds into a single rowset-level [min, max] entry, primarily targeting cloud mode FDB value size limits.

Changes:

Add enable_aggregate_non_mow_key_bounds BE config (default off) and use it in rowset write + ordered compaction to optionally aggregate non-MOW segments_key_bounds.
Add segments_key_bounds_aggregated to RowsetMeta protobufs and propagate it through pb_convert, snapshot restore, index builder, and rowset meta helpers.
Add BE unit tests and a regression suite validating aggregated/non-aggregated behavior for non-MOW vs MOW tables.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
regression-test/suites/data_model_p0/duplicate/test_non_mow_key_bounds_aggregation.groovy	Regression coverage for config on/off and MOW vs non-MOW layout expectations.
gensrc/proto/olap_file.proto	Adds `segments_key_bounds_aggregated` flag to RowsetMetaPB/RowsetMetaCloudPB.
be/test/storage/rowset/rowset_meta_test.cpp	Unit tests for aggregation behavior and truncation interaction.
be/src/storage/task/index_builder.cpp	Preserves aggregated/truncated flags when rebuilding rowset meta for index tasks.
be/src/storage/tablet/base_tablet.cpp	Adds DCHECK to ensure MOW lookup path never sees aggregated bounds.
be/src/storage/rowset/rowset_meta.h	Adds aggregated flag accessors and extends `set_segments_key_bounds` API.
be/src/storage/rowset/rowset_meta.cpp	Implements aggregation in `set_segments_key_bounds`; adds DCHECK in `merge_rowset_meta`.
be/src/storage/rowset/rowset.h	Exposes `is_segments_key_bounds_aggregated()` on Rowset.
be/src/storage/rowset/beta_rowset_writer.cpp	Aggregates non-MOW key bounds at write time based on new config.
be/src/storage/compaction/compaction.cpp	Aggregates non-MOW key bounds for ordered compaction output based on new config.
be/src/common/config.h / be/src/common/config.cpp	Declares/defines `enable_aggregate_non_mow_key_bounds`.
be/src/cloud/pb_convert.cpp	Propagates aggregated flag between Doris and cloud rowset meta PBs.
be/src/cloud/cloud_snapshot_mgr.cpp	Preserves aggregated/truncated flags during snapshot rowset meta creation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

liaoxin01 · 2026-04-19T14:20:15Z

run buildall

liaoxin01 · 2026-04-19T14:51:39Z

run buildall

github-actions · 2026-04-19T15:54:45Z

OpenCode automated review failed and did not complete.

Error: Review step was failure (possibly timeout or cancelled)
Workflow run: https://github.com/apache/doris/actions/runs/24630798753

Please inspect the workflow logs and rerun the review after the underlying issue is resolved.

hello-stephen · 2026-04-19T16:22:22Z

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

liaoxin01 · 2026-04-20T06:13:45Z

run buildall

hello-stephen · 2026-04-20T07:25:34Z

BE UT Coverage Report

Increment line coverage 80.90% (72/89) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.22% (20329/38200)
Line Coverage	36.77% (191370/520432)
Region Coverage	33.06% (148692/449708)
Branch Coverage	34.18% (65038/190288)

hello-stephen · 2026-04-20T08:21:37Z

BE Regression && UT Coverage Report

Increment line coverage 82.02% (73/89) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.84% (27627/37413)
Line Coverage	57.58% (298775/518852)
Region Coverage	54.83% (248849/453891)
Branch Coverage	56.36% (107575/190878)

hello-stephen · 2026-04-20T08:29:18Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/22) 🎉
Increment coverage report
Complete coverage report

github-actions · 2026-04-20T08:56:51Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-20T08:56:54Z

PR approved by anyone and no changes requested.

hello-stephen · 2026-04-21T07:30:28Z

BE Regression && UT Coverage Report

Increment line coverage 80.41% (78/97) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.78% (27600/37407)
Line Coverage	57.47% (298203/518912)
Region Coverage	54.71% (248342/453933)
Branch Coverage	56.22% (107368/190971)

hello-stephen · 2026-04-21T07:33:08Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-21T07:35:30Z

BE Regression && UT Coverage Report

Increment line coverage 80.41% (78/97) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.78% (27598/37407)
Line Coverage	57.46% (298171/518912)
Region Coverage	54.68% (248202/453933)
Branch Coverage	56.21% (107339/190971)

hello-stephen · 2026-04-21T07:42:57Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/1) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-21T07:52:25Z

BE UT Coverage Report

Increment line coverage 79.38% (77/97) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.30% (20358/38195)
Line Coverage	36.86% (191864/520513)
Region Coverage	33.18% (149223/449777)
Branch Coverage	34.28% (65267/190398)

hello-stephen · 2026-04-21T07:58:10Z

BE Regression && UT Coverage Report

Increment line coverage 80.41% (78/97) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.88% (27636/37407)
Line Coverage	57.60% (298912/518912)
Region Coverage	54.81% (248797/453933)
Branch Coverage	56.39% (107679/190971)

liaoxin01 · 2026-04-22T07:22:56Z

/review

github-actions

I found one blocking correctness issue.\n\n1. \nThe new aggregated key-bounds flag is not actually preserved in the index rewrite path. is immediately overwritten by the following call, which recomputes the flag as when already contains the single aggregated entry. Rebuilding an aggregated non-MOW rowset via ADD/DROP INDEX therefore emits metadata with but .\n\nCheckpoint summary:\n- Goal: The PR is meant to aggregate non-MOW key bounds and preserve the new layout/flag across write, compaction, cloud-conversion, restore, and index-rewrite paths. The current head still misses the index-rewrite path, so the goal is not fully met.\n- Tests: Unit coverage for and regression coverage for write-time aggregation look good. I did not find coverage for ADD/DROP INDEX on an already aggregated non-MOW rowset, which is why this regression slips through.\n- Concurrency/lifecycle: I did not find a new concurrency or lifecycle issue in the touched paths.\n- Config: The new mutable BE config is observed in the write and ordered-compaction paths. Note that the code currently defaults it to , which is broader than the PR text's "default off".\n- Compatibility/persistence: The proto fields and cloud/restore conversion paths are preserved correctly in the current head.\n- Performance: The intended metadata-size reduction looks fine where the aggregated flag is preserved.

github-actions · 2026-04-22T07:57:15Z

Correction to my earlier review text: the blocking issue is in be/src/storage/task/index_builder.cpp. rowset_meta->set_segments_key_bounds_aggregated(input_rowset_meta->is_segments_key_bounds_aggregated()) is immediately overwritten by the next rowset_meta->set_segments_key_bounds(key_bounds) call. When the input rowset is already aggregated, key_bounds contains the single aggregated entry, but set_segments_key_bounds(key_bounds) runs with aggregate_into_single=false, so the rebuilt rowset loses the aggregated flag while still keeping num_segments > 1 and segments_key_bounds.size() == 1. Ordered compaction relies on the aggregated flag to preserve that layout when the config is off, so ADD/DROP INDEX can emit inconsistent rowset metadata here. Please pass the source flag into set_segments_key_bounds(key_bounds, input_rowset_meta->is_segments_key_bounds_aggregated()) or remove the standalone setter, and add a test that rebuilds an already-aggregated non-MOW rowset through the index path.

liaoxin01 · 2026-04-22T08:32:43Z

run buildall

hello-stephen · 2026-04-22T09:02:23Z

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	78.06% (1843/2361)
Line Coverage	64.74% (32984/50947)
Region Coverage	65.23% (16362/25085)
Branch Coverage	55.80% (8734/15652)

hello-stephen · 2026-04-22T09:53:24Z

FE UT Coverage Report

Increment line coverage `` 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-22T12:14:42Z

FE Regression Coverage Report

Increment line coverage 0.00% (0/63) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-04-22T12:26:12Z

BE Regression && UT Coverage Report

Increment line coverage 80.41% (78/97) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.42% (27467/37410)
Line Coverage	57.22% (296995/519023)
Region Coverage	54.29% (246579/454158)
Branch Coverage	55.92% (106832/191028)

fixed.

github-actions · 2026-04-24T07:51:11Z

PR approved by at least one committer and no changes requested.

…eta size (#62604) For non-MOW (duplicate / aggregate key) tables, per-segment key bounds are not consumed on the read path — only the rowset-level [min, max] is used by the reader and ordered compaction. In cloud mode, persisting bounds for every segment can blow past FDB's value size limit on commit_rowset. Introduce an `enable_aggregate_non_mow_key_bounds` BE config (default off). When enabled, non-MOW rowsets collapse per-segment bounds into a single [overall_min, overall_max] entry at write time, and compaction preserves this behavior. MOW rowsets always retain per-segment bounds — their `lookup_row_key` path relies on them for delete bitmap computation, and is guarded by a new DCHECK against aggregated input. A new optional `segments_key_bounds_aggregated` flag is added to both RowsetMetaPB and RowsetMetaCloudPB so consumers can distinguish aggregated from per-segment layouts. Proto round-trip, pb_convert, snapshot restore, and index builder all preserve both this flag and the existing `segments_key_bounds_truncated` flag. Correctness notes: - `first_key/last_key` callers (`block_reader`, ordered compaction) already bail out on overlapping rowsets, so for non-overlapping rowsets the aggregated [min, max] equals seg[0].min / seg[last].max exactly. - `merge_rowset_meta` (MOW partial-update publish) DCHECKs both sides are non-aggregated.

Copilot AI review requested due to automatic review settings April 19, 2026 13:58

liaoxin01 requested review from dataroaring, gavinchou and yiguolei as code owners April 19, 2026 13:58

Copilot started reviewing on behalf of liaoxin01 April 19, 2026 13:58 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread be/src/storage/compaction/compaction.cpp Outdated

Comment thread be/src/storage/rowset/rowset_meta.cpp Outdated

Comment thread be/src/storage/tablet/base_tablet.cpp Outdated

config

10a4293

fix comment

89e4675

liaoxin01 force-pushed the feat-non-mow-key-bounds-aggregation branch from 1b267ba to 89e4675 Compare April 19, 2026 14:50

gavinchou reviewed Apr 19, 2026

View reviewed changes

Comment thread be/src/cloud/pb_convert.cpp Outdated

liaoxin01 mentioned this pull request Apr 19, 2026

[improvement](be) Persist rowset key bounds for non-MoW rowsets #62457

Closed

fix ut

86d1008

gavinchou previously approved these changes Apr 20, 2026

View reviewed changes

Comment thread be/src/cloud/pb_convert.cpp Outdated

Comment thread be/src/cloud/pb_convert.cpp Outdated

Comment thread be/src/cloud/pb_convert.cpp Outdated

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 20, 2026

github-actions Bot added the reviewed label Apr 20, 2026

fix backup restore

bfbb1ae

liaoxin01 dismissed gavinchou’s stale review via bfbb1ae April 20, 2026 09:06

liaoxin01 added dev/3.1.x dev/4.0.x dev/4.1.x labels Apr 22, 2026

github-actions Bot previously requested changes Apr 22, 2026

View reviewed changes

Comment thread be/src/storage/task/index_builder.cpp Outdated

fix index builder preserve aggregated flag

1e0680d

gavinchou approved these changes Apr 24, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 24, 2026

gavinchou approved these changes Apr 24, 2026

View reviewed changes

liaoxin01 merged commit 618ebd7 into apache:master Apr 24, 2026
32 of 34 checks passed

liaoxin01 deleted the feat-non-mow-key-bounds-aggregation branch April 24, 2026 07:52

github-actions Bot mentioned this pull request Apr 24, 2026

branch-4.0: [opt](rowset) Aggregate non-MOW segment key bounds to reduce rowset meta size #62604 #62799

Open

github-actions Bot mentioned this pull request Apr 24, 2026

branch-4.1: [opt](rowset) Aggregate non-MOW segment key bounds to reduce rowset meta size #62604 #62802

Open

Conversation

liaoxin01 commented Apr 19, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Apr 19, 2026

Uh oh!

liaoxin01 commented Apr 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liaoxin01 commented Apr 19, 2026

Uh oh!

liaoxin01 commented Apr 19, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

hello-stephen commented Apr 19, 2026

FE UT Coverage Report

Uh oh!

liaoxin01 commented Apr 20, 2026

Uh oh!

hello-stephen commented Apr 20, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 20, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 20, 2026

FE Regression Coverage Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

hello-stephen commented Apr 21, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 21, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented Apr 21, 2026

BE Regression && UT Coverage Report

Uh oh!

hello-stephen commented Apr 21, 2026

FE Regression Coverage Report

Uh oh!

hello-stephen commented Apr 21, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 21, 2026

BE Regression && UT Coverage Report

Uh oh!

liaoxin01 commented Apr 22, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Apr 22, 2026

Uh oh!

liaoxin01 commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

Cloud UT Coverage Report

Uh oh!

hello-stephen commented Apr 22, 2026

FE UT Coverage Report